Abstract

The replication crisis has eroded the public’s trust in science. Many famous studies, even published in renowed journals, fail to produce the same results when replicated by other researchers. While this is the outcome of several problems in research, one aspect has gotten critical attention—reproducibility. The term reproducible research refers to studies that contain all materials necessary to reproduce the scientific results by other researchers. This allows other to identify flaws in calculations and improve scientific rigor. In this paper, we show a workflow for reproducible research using the R language and a set of additional packages and tools that simplify a reproducible research procedure.

1 Introduction

The scientific database Scopus lists over 73,000 entries for the search term “reproducible research” at the time of writing this document. The importance of making research reproducible was recognized in the early 1950s in multiple research subjects. And with the reproducibility project the Open Science Foundation (Open Science Collaboration and others 2015) found that merely half of all studies conducted in psychological research can be replicated by other researchers. Several factors have contributed to this problem. From a high level perspective, the pressure to publish and the increase in scientific output has lead to a plethora of findings that will not replicate. Both bad research design and (possibly unintentional) bad research practices have increased the amount of papers that hold little to no value. More than half of researchers agree that there is a severe reproducibility crisis in science according to Baker (2016) and her article in Nature. The study also found that problems for reproducibility include: 1) lack of analysis code availability, lack of raw data availabilit, and problems with reproduction efforts.

2 Problematic Research Practices

One problem that is often mentioned is HARKing (Kerr 1998) or “hypothesizing after results are known”. When multiple statistical tests are conducted with a normal alpha-error rate (e.g., \(\alpha = .05\)), it is expected that some tests will reject the null-hypothesis on mere randomness alone. Hence, the error-rate. If researchers now claim that these findings were their initial hypotheses, results will be indiscernible from randomness. However, this is unknown to the reviewer or reader who only hears about the new hypotheses. HARKing produces findings were there are none. It is thus crucial to determine the research hypothesis before collecting (or analyzing) the data.

Another strategy applied (often without ill intent) is p-hacking (Head et al. 2015). This technique is widespread in scientific publications and probably already is shifting consensus in science. p-hacking refers to techniques that alter the data until the desired p-value is reached. Omitting individual outliers, creating different grouping variables, adding or removing control variables—all these techniques can be considered p-hacking. This process also leads to results that will not hold under replication. It is crucial to show what modifications have been performed on data to evaluate the interpretability of p-values.

When researchers already “massage” the data to attain better p-values, it is additionally bad that many researchers do not understand the meaning of p-values. As Colquhoun (2017) found, many research misinterpret p-values and thus frame their findings much stronger than they really are. Adequate reporting of p-values is thus important to the interpretability of results as well.

Lastly, scientific journals have to problem that they are mostly interested in publishing significant results. Thus contradictory “non-findings” seldom get published in renowned journals. There is little “value” for a researcher to publish non-significant findings, as the additional work to write a manuscript for something like arXiv does often not reap the same reward as a journal publication. This so-called publication bias (Simonsohn, Nelson, and Simmons 2014) worsens the crisis. As now only significant findings are available. It is thus necessary to simplify the process of publishing non-significant results.

3 Reproducible Research Workflows

Many different solutions to this process have been proposed to address these challenges (e.g., (Marwick, Boettiger, and Mullen 2018; Wilson et al. 2017)). However, no uniform process exists that allows creating of documents and alternative reproducibility materials in one workflow.

In this paper, we demonstrate a research workflow based on the R-language and the R Markdown format. This paper was written using this workflow and the sources are freely available online (https://www.osf.io/kcbj5). Our workflow directly addresses the challenge of writing LNCS papers and a companion paper website (https://sumidu.github.io/reproducibleR/) that includes additional material and downloadable data.

In this paper, we will focus on the following aspects:

  • Creating a reproducible research compendium using RMarkdown
  • Using github and the OSF to make research accessible
  • Packages that simplify research in RStudio

We assume that the reader is somewhat familiar with the R Programming language and knows that scientific analyses can be run using computational tools such as R, Python, Julia or others. The guidance in this paper adresses the R user.

3.1 What is reproducibility?

The Open Science Foundation (OSF) speaks of three different kinds of reproducibility (Meyers 2017). Computational reproducibility refers to the quality of research that when other researchers get access to your code and data that they will be able to reproduce your results. Empirical reproducibility means that your research has sufficient information that allows other researchers to recreate your experiments and copy your study. Replicability refers to the quality of an outcome and a study, meaning that given that you were to reproduce the experiment, you would also reach the same outcome. In this article we provide tools for the first type of reproducibility only, as the latter are both dependent on your research content not exclusively on your procedure. It is import to note that creating computationally reproducible research is important, but it is also worthless when basic concepts of methods and research processes are ignored. If you measure incorrectly, your result may reproduce, but the finding my be wrong anyways. Hopefully, others will be able to point this out to you more easily.

4 Writing a Reseach Compendium

The central aim of a research compendium is to provide all data and information necessary to allow others to reproduce your findings from your data (Gentleman and Temple Lang 2007). There are several different ways of achieving this but a central theme of research compendia is to organize data in a meaningful fashion. Since we are adressing R users, it makes sense to consider possible computing environments for R first.

You can find detailed further informaiton on how to create research compendia online: https://research-compendium.science/

4.1 Why R and RMarkdown?

R is the de-facto standard when it comes to statistical analyses tools that are open source and free to use. In economics and the social sciences similar tools that provide a GUI like SPSS are used with one immediate downside for reproducibility. If your analysis toolkit is proprietary, other users will not be able to reproduce your work without a significant investment.

Moreover, using a GUI makes it intraceble—even to yourself—what analyses you have conducted later. You might have manually deleted a row with broken data, or might have recoded a typing error in your data manually. If this is not documented, this information is lost. Using a language like R, where every change of the data corresponds to line of code, no accidental “quick fixes” will get lost over time.

4.2 Literate Programming

RMarkdown is a tool that is extremely helpful for researchers, as it allows to combine analysis code with regular text. This document was written using RMarkdown and integrating some analysis code in between. RMarkdown is a literate programming appraoch. The documentation of code is equally necessary in underanding the code, as the code itself. By interleaving code and text, intentions of the developer are implicitly communicated. Python and Julia have similar approaches by using Jupyter notebooks.

RMarkdown allows not only for an integration of text and figures directly from code, it allows writing in an abstract format. A single document (such as this) can be rendered to various output format. In this case, it is rendered to the LNCS styled Latex output format, as well as to a website using bootstrap. The benefit is that text and code are reusable, so when papers get rejected no excessive reformatting has to be made. Formatting is down using Markdown (see here[^1] for a tutorial).

[^1:] https://www.markdowntutorial.com/

4.3 Project workflows

The most popular integrated development environment (IDE) for R is RStudio. RStudio comes with an license that allows research to freely use it for scientific purposes and it integrates many of the tools described in this paper. The first strong tool for reproducible research using R is using RStudio projects.

RStudio projects contain information about where your code, your data, and your output should reside on your computer. The benefit of RStudio projects is that they contain relative path informations, so when another user installs your project on their computer, it should work without a problem. Since you need to refer to files in some cases, even relative paths the here package provides a helpful tool to access data relative to the project main directory. This works on Linux, Windows and Mac computers.

4.4 Package management

packrat renv docker rstudio.cloud projects

4.5 Data Sharing and Anonymization

4.6 Writing Articles using rmdtemplates

CRAN.

  • Creating a research compendium (Gentleman and Temple Lang 2007)
  • Creating a project oriented workflow
  • Literate programming
  • Use of packrat for package management
  • Anonymization and Data replacement using sdcMicro (Templ, Meindl, and Kowarik 2020)
  • Creating LNCS Papers using rmdtemplates (Calero Valdez 2020)

5 Open Data and open code

  • The use of Version control and a public repository Bryan (2018)
  • Creating a project read me using R Markdown
  • Making use of github-pages for companion websites
  • Use of osf for preregistration

6 Automizing builds using drake

library(drake)
readd("hist")

  • Several helpful packages for a reproducible workflow (here (Müller 2017), usethis (Wickham and Bryan 2019), drake (Landau 2020))
  • Several helpful packages for interactive, yet reproducible research in RStudio (citr (Aust 2019), gramr (Dumas, Marwick, and Shotwell 2020), questionr (Barnier, Briatte, and Larmarange 2018), esquisse (Meyer and Perrier 2020))
  • Creating powerful plots using ggstatsplot (Patil 2020)
  • Create research process plots using DiagrammeR (Iannone 2020)

7 Procedure

Process diagramms as in Figure 7.1 can easily be created using the DiagrammeR (Iannone 2020) Package.

library(DiagrammeR)

grViz(diagram = "
      digraph boxes_and_cicrles {
      
      graph [rankdir = LR]
      
      node [shape = box
            fontname = Helvetica
            ]
      'Setup OSF Project Site'
      Test
      
      node [shape = circle]
      
      Start
      
      edge []
      
      Start->'Setup OSF Project Site';
      'Setup OSF Project Site'->Test;
      }
      ")

Figure 7.1: Example

7.1 Separation of Analysis and Data-Collection

7.2 Anonymization of Raw Data

Option 1 sdcMicro Option 2 anonymizer

7.3 Preregistration

8 Discussion

9 Data

On this sub-page you can find the data used as a downloadable file (CSV, Excel, or PDF).

data_df <- iris



datatable(data_df, filter = list(position = "top", clear = TRUE, plain = FALSE), 
    extensions = c("Buttons", "FixedColumns"), options = list(dom = "Bfrtip", buttons = c("copy", 
        "csv", "excel", "pdf"), scrollX = TRUE, fixedColumns = TRUE))
# rmdtemplates::line_cite(pkgs) # This creates a single line citing all packages
rmdtemplates::list_cite(pkgs)  # This creates a 'thightlist' of all packages 

10 Used Packages

We used the following packages to create this document:

  • Package: knitr by Xie (2020)
  • Package: tidyverse by Wickham (2019)
  • Package: rmdformats by Barnier (2019)
  • Package: kableExtra by Zhu (2019)
  • Package: scales by Wickham and Seidel (2019)
  • Package: psych by Revelle (2020)
  • Package: rmdtemplates by Calero Valdez (2020)
  • Package: sdcMicro by Templ, Meindl, and Kowarik (2020)
  • Package: webshot by Chang (2019)
  • Package: here by Müller (2017)
  • Package: DiagrammeR by Iannone (2020)
  • Package: citr by Aust (2019)
  • Package: drake by Landau (2020)
  • Package: esquisse by Meyer and Perrier (2020)
  • Package: usethis by Wickham and Bryan (2019)
  • Package: gramr by Dumas, Marwick, and Shotwell (2020)
  • Package: questionr by Barnier, Briatte, and Larmarange (2018)
  • Package: ggstatsplot by Patil (2020)

References

Aust, Frederik. 2019. Citr: RStudio Add-in to Insert Markdown Citations. https://CRAN.R-project.org/package=citr.

Baker, Monya. 2016. “Reproducibility Crisis?” Nature 533 (26): 353–66.

Barnier, Julien. 2019. Rmdformats: HTML Output Formats and Templates for ’Rmarkdown’ Documents. https://CRAN.R-project.org/package=rmdformats.

Barnier, Julien, François Briatte, and Joseph Larmarange. 2018. Questionr: Functions to Make Surveys Processing Easier. https://CRAN.R-project.org/package=questionr.

Bryan, Jennifer. 2018. “Excuse Me, Do You Have a Moment to Talk About Version Control?” The American Statistician 72 (1): 20–27.

Calero Valdez, André. 2020. Rmdtemplates: Rmdtemplates - an Opinionated Collection of Rmarkdown Templates. https://github.com/statisticsforsocialscience/rmd_templates.

Chang, Winston. 2019. Webshot: Take Screenshots of Web Pages. https://CRAN.R-project.org/package=webshot.

Colquhoun, David. 2017. “The Reproducibility of Research and the Misinterpretation of P-Values.” Royal Society Open Science 4 (12): 171085.

Dumas, Jasmine, Ben Marwick, and Gordon Shotwell. 2020. Gramr: The Grammar of Grammar. https://github.com/ropenscilabs/gramr.

Gentleman, Robert, and Duncan Temple Lang. 2007. “Statistical Analyses and Reproducible Research.” Journal of Computational and Graphical Statistics 16 (1): 1–23.

Head, Megan L, Luke Holman, Rob Lanfear, Andrew T Kahn, and Michael D Jennions. 2015. “The Extent and Consequences of P-Hacking in Science.” PLoS Biology 13 (3): e1002106.

Iannone, Richard. 2020. DiagrammeR: Graph/Network Visualization. https://CRAN.R-project.org/package=DiagrammeR.

Kerr, Norbert L. 1998. “HARKing: Hypothesizing After the Results Are Known.” Personality and Social Psychology Review 2 (3): 196–217.

Landau, William Michael. 2020. Drake: A Pipeline Toolkit for Reproducible Computation at Scale. https://CRAN.R-project.org/package=drake.

Marwick, Ben, Carl Boettiger, and Lincoln Mullen. 2018. “Packaging Data Analytical Work Reproducibly Using R (and Friends).” The American Statistician 72 (1): 80–88.

Meyer, Fanny, and Victor Perrier. 2020. Esquisse: Explore and Visualize Your Data Interactively. https://CRAN.R-project.org/package=esquisse.

Meyers, Natalie K. 2017. “Reproducible Research and the Open Science Framework.” OSF. osf.io/458u9.

Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

Open Science Collaboration, and others. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.

Patil, Indrajeet. 2020. Ggstatsplot: ’Ggplot2’ Based Plots with Statistical Details. https://CRAN.R-project.org/package=ggstatsplot.

Revelle, William. 2020. Psych: Procedures for Psychological, Psychometric, and Personality Research. https://CRAN.R-project.org/package=psych.

Simonsohn, Uri, Leif D Nelson, and Joseph P Simmons. 2014. “P-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results.” Perspectives on Psychological Science 9 (6): 666–81.

Templ, Matthias, Bernhard Meindl, and Alexander Kowarik. 2020. SdcMicro: Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation. https://CRAN.R-project.org/package=sdcMicro.

Wickham, Hadley. 2019. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.

Wickham, Hadley, and Jennifer Bryan. 2019. Usethis: Automate Package and Project Setup. https://CRAN.R-project.org/package=usethis.

Wickham, Hadley, and Dana Seidel. 2019. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.

Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K Teal. 2017. “Good Enough Practices in Scientific Computing.” PLoS Computational Biology 13 (6): e1005510.

Xie, Yihui. 2020. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.

Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.